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RETRIEVING A REPUCA OF AN ELECTRONIC DOCUMENT IN A COMPUTER 
NETWORK 



FIELD OF THE INVENTION 

The piesent invention r^ates genacaUy to r^lica access in a conqiuter netvroxlsu More 
paiticulBrly; fhe present invention idiates to letiieving and/or depositing a i^lica of an 
electronic docunira.t in a compute netwodc 



BACKGROUNP OF THE INVENTION 

Instant access to electronic documents and data becomes increasingly more criticai for 
day-to-day business operations. As a xesult, storage needs to be reliable and resilimt to 
failures, includiag localized physical damage. Distnbuted, replicated storage ov«c a computer 
netwoik seems the only way out. 



Unfortunately, today's distributed/replicated systems either require full^ identical replication 
between the computing entities involved which typically are at least two data centers in 
different locations, or require in the case of distributed storage, a centralized controller 
keeping track of the replica distribution. Anyone who is to access more than one r^lica needs 
to either know the full list of replicas or needs to have access to a directory service which 
returns this information, either globally - for all documents -> or on a per-document basis. 

Distributed storage becomes increasingly more important, as existxng inexpensive machines 
can be used to serve, content With the advent of distributed hash table (DHT) technology, 
self-organizing storage networks have become feasible and have raised signiScant inteiest in: 
the community. Sitting "on top" of the Ihtemet, these scalable overlay networks use the 
transport capabilities of the underlying network, but add value. DHT technology provides a 
mspping from resource IDs to hosts (D->H) that is typically preceded by a mapping &om 



30/08 '03 HO 15:56 FAX +41 1 724 89 51 

CH920030035 



IBU ZURICH IPD 



EPOl PATENTS 



@011 



-2- 



lesource name to ieso«n:e n> (N->D). Tins is achieved using nunimal it,«ting infoimatioa in 
each node. DHTs generaUy are also pr^ared to deal with changes in host availability and 
network coimectivi^. 



DHTs come in a variety of routing flavois. but share the properties that messages aie 
transported on a hop-by-hop basis among constituent nodes of the overlay network. Bach hop 
knows how to get closer to the destination, until it fmally reaches the node that claims the 
requested ID as its own and acts according to the request 

Some of the DHTs operate based on int^vals ring topologies, such as described in "Chorf: A 
Scalable Peerto-peer Lookup Service foriit«net AppBcalions", Ion Stoica ci aL, Proceedings 
of ACM SIGCOMM 2001, August 2001, pages 149 - 160. some spUt hyperspaces into 
manageable chunks, as desadbed in "A Scalable Content-Addiessable Netwoifc", Sylvia 
Ratnasamy et al.. Proceedings of ACM SIGCOMM, September 2001. or "Efficient 
Topology-Aware Overlay Network", Marcel Waldvogd and Roberto Rinaldi. ACM 
Computer Communications Review, January 2003, Volume 33, Number 1, pages 101 - 106, 
whereas others implemrat a loottess tree, sudi as described in "Pastiy: Scalable, distributed 
object location and KUMiug for large-scale pew-to-peer systems". Anthony Rowstcon and Peter 
Iteuschel, nop/ACM International Conference on Distributed Systems Platforms 
CMiddlewaro), November 2001, pages 329 - 350, or 'Tapestry: An Ih&astnictuie for 
Fault-tolerant Wide-area Location and Routing", Ben Y. Zhao ec al,. University of California, 
Berkeley, UCB/CSD-01-1141, April 2001. 

Many of these DHT systems ate able to exploit the locality of the underlay network. Locality 
aspects are typically separated into geographic layout and proximity forwarding categories 
adapted fitom "Exploiting Network Proximity in Distributed Hash Tables", Miguel Castro et 
aL, International Workshop on Future Directions m Distributed Computing CFuDiCo). edited 
by Ozalp Babaoglu and Ken Birman and Keith MarzuDo. June 2002, pages 52 - 55. 

"Accessing Nearby Copies of RepUcated Objects in a Distributed Environment", C. Greg 
naxton et al., ACM Symposium on Parallel Algorithms and Architecnires", 1997, pages 311- 
320, shows another approach to locality patterns. 



30/06 03 MO 15:56 FAX ^41 1 724 89 51 IBM ZURICH IPD E?OX PATENTS ©012 

CH920030035 

-3- 

An qipioach of linking DHTs and caching is shown in •*OceanStoie: An Aichitecture for 
Global-Scale Peisi$cent Storage". Jolin Kubiatowicz et al. Proceedings of ACM ASPLOS^ 
November 2000. Ibefe, queries passing along the DHT axe ledixecled by Att^ 
Rilteis (ABF), when there is a hi^ probability that a document cache can be found along that 
route. Besides the diances for &lse positives despite continuous ABF update traffic; th»e is 
no way for the document originator to addtess selected replicas when the need 



"INS/Twine: A Scalable Peer-to-Peer Ardiitectiire for Intentional Resource Discovery", 
Magdalena Balazinstei et al., Pervasive 2002 - Ihtemational Conference on Pervasive 
Computing, August 2002, shows an example of a resource discovecy/dsrectoxy service on top 
ofaDHT. 

US20020114341A1 presents a peer-to-peer enterprise storage, which uses a centralized 
controllBr/coozdinator* 



Applicant's US 6,223:^06 discloses a method and system for load balancing by r^licatiag a 
portion of a file being read by a first stream onto a second device and reading the portion with 
a second stream capable of accessing. This prior art deals with a completely centralized 
system. 



US20030014523A1, US20030014433 Al. and US20030014432A1, each introduces a storage 
network data replicator. There are algorithms disclosed on how to r^lLcate fixmi one instance 
to the odi^. It is described which existing replica to select as a source for further replication. 



US 6,467,046» and EP 807 885 Bl both show a system and a method for automatically 
distributing copies of a replicated database in a computer system. Hosts and disks for 
deteiminingr^Iica placement are enumerated in order to improve reliability. 

US 5^815,649 illustrates a distributed fault tolerant digital data storage subsystem for fault 
tolerant computer system. Multiple redundant computers are used as a front-end to multiple 
redundant disks^ basically as a network RAID (Redundant Array of Inexpensive Disks). 
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Accarding to US 6,470,420, a method is proposed for designating one of a plurality of 
addressable storage devices to process a data transfer request A client multicasts a single 
request to aU repUcas and they cooperatively select the one to reply. 

WO 03/012699 Al shows systems and methods for providing metadata for tracking of 
information on a distributed file system of storage devices. Metadata are used to locate the 
files 

In US 6,163.836 a method and an j^jparatus are shown for file Systran disaster recovery. 

According to another applicant's patent US 5,897,661, there is illustrated a logical volume 
manage and a corresponding method for having enhanced update capability widi dynamic 
allocation of storage and minimal storage of metadata information. Metadata lepUcation is 
provided, which is limited to ttiose storage providers who have a need to kaow. 

In WO 02/093298 A3, a modular storage server anjiitectnre is described with dynamic data 
management This document shows replication according to locality access patterns and 
hierarchical storage management 

According to US20030028695A1. a producer/consumer locking system for efficient 
replication of ffle data is shown which provides loddng betwewi concurrent qpoations. 

According to US 5,588,147, a repUcation faciU^ is descrjfted. which uses a log file 
mechanism to replicate documwts. 

Despite die work done on replication and distributed storage, there cuirenay is a lack of 
replication mechanism on top of the completely distributed technology which does not suffer 
fiom the presence of single points of failure. As replicas not only improve availability, but 
may also balance load, there ha'/e been distnbubsd mechanisms also for the purpose of 
caching. Besides reliability, caching systems also pose an update problem: As it is not dear, 
where information is cached, a cache may become stale, if it does not continuously track the 
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stams of the origijoal location. Ttiis poses a seveie dtiallenge in scalability, imdoin^ the 
o&loading, cachiag provides. 

Hence, it is desirable to provide a mechanisni for managing r^licas in a compute netwozk» 
which mechanism is reflected in appropriate methods, computing entities and computer 
program elements for retrieving and/or depositing replicas in a computer network. 



S UMMARY OF THE INVENTION 

The present invention provides a method for retrieving a replica of an electronic document in 
a computer network. At least one replica number is selected and a given function is applied. 
The function requires the replica number and a document identifier associated to the 
electronic document as input By applying the function k times with k different replica 
numbers as input, k entity identifiers are det^mined, wherem eac* entity identifier represents 
a computing entity in Che network that might i>rovide the r^lica* k is an integer number equal 
to or greater than 1. Then, a document related request is addressed to at least one of the 
identified entities. 

This method is preferably automatically executed on a retrieving computing entity with access 
to &e network* 

According to another aspect of die present invention, there is provided a method for 
depositing a replica of an electronic document in a computer network. Again, a replica 
number is selected and a given function is e^plied, the fimcdon requiring the replica number 
and a document identifier as input The output of this function is an entity identifier, the> entity 
identifi^* representing a computing entity in the networis^ on which a replica with the chosen 
replica number can be deposited. The identified entity is then addressed for replica depositing 
or amending puiposes. This meSiod might also be used for depositing document related 
umendm»t5. 
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This method is preferably executed on a depositing or amending computing entity with access 
to the netwoik. 



The idea of this approach is that replicas of an electronic document can only be stored at 
predefined addresses in a computer network. Such addresses are also called entity identifiers 
in the present context. The addresses are predefined by a function which provides for each 
replica number per document an address associated to a computmg entity where the particular 
replica can be found or can be stored. 



In a preferred embodiment, the function is a pseudo-random hash function, where each 
address/identifier is mapped to one of the machines/entities in the network using a distributed 
hash table. 

By means of this function, repUcas cannot randomly be added or deleted anywhere in the 
network- Replicas can only be accessed or deposited at very defined locations/entities in the 
networis:, which entities are determined by implying the function with a replica number as 
input Nevertheless, in case of applying a pseudo-random hash function, the pseudo-random 
propegrty of the hash function assures that the replicas will be evenly distributed. However, 
oth^ replicas can be disposed on entities in the network that follow another rule or function- 
The replicas deposited by means of the function can then also serve as a fallback solution for 
e.g» a centralized management ^stem in case this system has a break down. 

The function provides information for each electronic document at which entities in Uie 
network replicas can be found or at which entities in the network replicas can be deposited. 
For supporting this f unction^ preferably a rrambering system introduces replica numbers for 
each document to be retrieved or to be stored, the replica numbers being preferably in a range 
between 1 and with N characterizing in ttiis case tiie highest replica number as well as the 
maximum numb^ of replicas allowed. After having a document identifier y and a replica 
number x for this dccument identifier selected, the timcdon then determines the entity 
identifier indicating the entity tiiat is chosen for replica number x of document y. In a wxitmg 
process, replica x of document y can Aen be seat to this entity and stored at this entity. la an 
access process, replica number x of document y can then be found at this particular entit/, but 
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not aecessarily must be fouad there: There might not have beea a need so far to have leplica x 
of document y deposited at (he associated entity for the reason that e.g. so far a lower number 
of leplicas than x was sufficient to serve requests fiom retrieving entities. 

Replicas of a document can only be retrieved at special addresses in the network detennined 
by the function, provided that the function is the same for dq[»o5itixig replicas of the same 
document 

By pre&^ly setting alimit for the numbers of lepIicas to be allowed in the netwoik, letrieval 
processes are optimized since the retrieving entity has a limited field of leplica numbm to 
base computations on and to look into if needed. By setting no or an esLcessive maximum 
numb^ of rqplicas, there might be a loss in retrieval time by tiymg to access entities where 
actually no replicas are stored, since no one used this entity ev^ befine for storing a replica 
with a v^ m&k numb^. 

With respect to the methods proposed, it is understood that th^ is no diffidence in 
terminology between tiie original document itself and any replica The original document 
itself is preferably stored under any replica number and therefore migiht be addressed under 
this replica number. 

Hie und^^g network might be a pe^-to-peer networis: or a hierarchical networic 

An electronic document might be any soxt of electronic file or data or even data base, copies 
of which mightbe stored as replicas at different locations in the computer network. An 
electronic document as presented herein might also be an active resource, such as a computer 
program that will perform a stored action when accessed, or might also be a fragment of an 
electronic file such as a storage imit, a sector or a cluster of sectors.. Replicas might be stored 
permanently at very specific locations e.g. for back-up purposes. Or th^ might be stored 
temporarily, e.g. for caching purposes, reducing the- load on the netv/ork or other replicas, 
including the original document store^ in particular when many users have a need to access the 
underlying document The invention supports distributed storage applications or document 
repository storage applications, as weU as distributed computing applications. Therefore, it is 
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prefaKd to have mote points of access than only the original lesouice by establishing repHcas 
ov« Ihe network to provide more computing capacity at the storing entities as weH as more 
network capacity in favor of the useis. Replicas might also be beneficial W a single os^ 
point of view who wants his electronic documents accessible on many different compoting 
entities like e.g. a laptop, a handheld, a mobile phone, or a desktop computer. ThB invention 
extends to aU these appUcations of repUcathig electronic documents, but is not limited to. 

The capabiUties of the entities described in connection with the invention depmds on the 
intended use. Generally, a computing entity might be any unit being able to access the 
network and communicate over the network. Some of the entities mi^t primarily serve as a 
deposit location for documents/replicas and ate thus prepared to store huge amounts of data. 
Such entities might in particular be server computers. However, every other computhig entily 
comprismg an interface to tiie networic and any sort of storage mi^t serve as an entity for 
providing repUcas. Retrieving entities might be embodied as cfient compnteis or any other 
type of computing entity, such as e.g. referenced above, or mi^t also be computers of a 
document administrator. Preferably, a computer can do both, retrieving and depositing 
replicas. 

According to the method of retrieving - i.e. locating - a xephc& of an electronic docummt. 
addressing the identified entity that might provide the repUca can be pafoimed in diffegcenc 
ways. The request addressed to the entity mi^t only make the addressed entity look up 
whether this rqplica is actually available and tell the result to the retrieving entity. Ih another 
embodunent, the request might also comprise Oie dnnand to send the replica to the retrieving 
device immediately, or to add or to amsnd or to update or to modify data delivered togettier 
vcith the request to the repKca at the ideiutified entity. 

In case of d^ositing a rqplica at an identified entity, the request might only tri^er a re^onse 
whether the addressed entity is ready for storing a replica, and mitiating an internal chedc 
therefore st &e addressed eniiiy. However, the request might also include the replica lis^ 
ready to be stored immediately at the addressed entity. 
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From a global view^ the method for retrieving replicas makes the clieat - also called retrleviag 
eatity - send only a small nximber of messages to locate a teplica. As can he seen from 
emhodiments below, the client might locate a dose or even the closest replica^ For this to 
woik out» the selection of replicas is preferably limited. Because tibe replicas can thus be 
enumerated and dicecdy addressed, it becomes possible for a client to find the document 
quickly. However, it is also beneficial for the docimi^t ovmer to find all replicas quickly, in 
case the document has been or has to be updated or contents of the documrat have to be 
verified or otherwise processed. 

Furthermore, each replica needs to store only the tuple of a document ID and a document 
content. A document content can be the content of the document itself, such as e.g. text or 
gcaphics, but may instead also be a (set of) pointers to the node(s) that actually store the 
document, and [a set of] replica indm or replica entity ID. In case the same entity can be 
responsible for multiple r^licas of the same document, whidti can be die case as a function 
sudii as a hash function randomly distributes the entity identifiers such that two or more of 
them may fall within the range covered by a single node, an entity and thus the rqplica might 
provide additional information in form of a replica index or replica entity ID which makes 
distinguish diff ermt replicas on the same ratity. 

Taking file preferred embodiment of a limitation of replica numbers into accotint, the selection 
of a replica number based on which tfie function can determine the identifier of the entity that 
is associated to this particular replica number of the chosen document can be handled in 
different ways. The selection can be a random selection of k replica numbers. However, the 
selection can also cover all N replica numbers, wherein N is the maximum number of replicas, 
and the preferred range of numbers might for example be [1 ..-,K) or [0 — J^~l]- In this 
preferred embodiment, the given function is applied k =s N times in onier to determine k entity 
identifiers which might provide access to a r^lica of the relevant document 

This cmbodim^l ;^ves ruQ fireedom to die retrieving entity to access a maximum number of 
entities each of which might provide a replica, or to select some idratiified entities or only one 
identified entity to address to. A selection sdieme mij^t fbUow an evaluation of die diffemt 
identified entities. 
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However, when resources are limited or when N is a big number, another approach might be 
preferred: Only k repUca numbers might be chosen out of a maximum N replicas, with k < N. 
Consequently, only fc entity identifiers are delivered by the function for determining such 
entity identifiers. 

This approach is beneficial for limiting the quantity of data which is the quantity of entity 
identifiers and in the end the quantity of entities which might be considered worth for 
addressing. 

Preferably, k might be chosen <== 5. This represents a reasonable range of entities which might 
be addressed for retrieving a replica, in particular when all the identified entities are directly 
addressed witiiout selecting a subset of entities for tins purpose. The rationale for this strategy 
is explained below in more detail. 

In another preferred embodiment k = 1, and thus, in this approach there is only one addressee 
identified fi:om the beginning and the request is addressed only to this one wtity. la one 
embodiment, the replica number might be selected randomly. Ih another embodiment, the 
replica number is selected within tiie low numbers of die range of allowed replica numbeis, 
provided that replicas are distributed over the network with ascending r^lica numbers. Thus, 
probability is increased lo hit a replica at a first shot when looking for tiie replica at an entity 
which provides a replica witfi a low replica number. However, this replica might be located 
remote from the retrieving entity. 

In a more parallel approach, the document related request is addressed to all identified entities. 
This embodiment reflects the fact, that not necessarily all identified entities really have to 
have the requested replica available. Note that the cntiiy assigned to a specific replica number 
by way of the function only determines the location where a replica with this r^lica number 
lias to be deposited if this replica number* is u$ed for depositing the r^lica. However, theie is 
no necessity that any entity in the network had a need so jfer to really deposit a replica witti 
this number at this particular location/ entity, since e.g. the number of replicas so far in use 
were sufBcient to cover the demand in the past 
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Hbwever, when addressing a inquest to more thao one identified entity, it is more likely to 
receive information from at least one of the addressed entities fhat the rqplica in question is 
available and ready for a download. 

In another preferred embodimmt, only selected ones of the idmtified entities are addressed 
with the docum^t related request. This embodiment allows a qualified selection of identified 
entities for addressing purposes. As the process to identify ratities providing replicas of a 
document mi^t not demand too much resources and thus be performed for the maximum N 
leplica numbers or at least for a major part of« liandling communication with many entities 
mig^ be cumbersome and a waste in time and resources. In order to minimi2e the number of 
requests to be sent to identified entities, an evaluation scheme might be applied for 
detennining only the most promising entities out of the idCTtified entities. The evaluation 
whether an entity is consideied to be promising can be based on diff^iCTt criteria* 

Ih particular when intense communication to entities is not favored for some reason^ an 
evaluation/selection process might end up in selecting only one pref ened entity to address the 
document related request to. 

A particularly preferred evaluation sdheme can comprise a cost fimction for calculating a cost 
value related to an enti^. Such a cost value mig^t indicate ihe cost to address the entity and/br 
to communicate with this entity and/or to p^oim a download firom this entity, whorein cost 
can be defined e.g. as time and/or resources needed and/or another parameter which indicates 
a prdfereace or a drawbacks, with regard to accessing the relevant entity. Preferably, the cost 
vahze is calculated for eadi of the k idratified entities. In a next step, the to be addressed 
CTtities are selected from the identified entities according to the calculated cost vadues. 
Preferably, only entities are selected to be addressed that show a low cost value. A threshold 
might be introduced to determine the entities showing a low cost value. Or entities are 
selected which show the lowest cost values out of the number of evaluated entities. 



30/06 -03 MO 15:59 FAX ^41 1 724 89 51 IBM ZURICH IFP EFOl PATENTS ©021 



-12- 

la a preferred embodiiaent, the cost functioxi might look up or derive cost values for llie 
entities from a cost database. Such data base can be a local (e.g.« cache), a centralized, or a 
distzibuced database. 

Preferably when no other means are available to derive cost values for entities, such cost 
values can be directly derived from a communication with these entities. In a preferred 
embodiment, the identified entities are addressed and called to send a response, where for 
example the time between the issuance of the request and the arrival of the response at the 
retrieving entity is measured and translated into a cost value for the addressed entity, Hiis cost 
value might be related to the location of the retrieving entity and the location of the addressed 
entity, as the more remote the addressed entity is located the longer the round trip time is. 
However, it might be reasonabje to determine cost values in this way in advance, in particular 
before a r^lica representing a huge file is to be downloaded. Then it might be a more time 
saving approach to address a number of identified entitles with short messages for cost 
estimating purposes than instead of starting immediately with the download from a randomly 
chosen identified entity, which by chance might provide a very slow download rate. 

In general, a cost value can be represented by an absolute value for the entity in question or by 
a relative value taking the cost as cost related to ttie retrieving entity. 

In case a *'replica not available" response is received ficom each of the addressed entities, 
another entity is selected £rom the identified entities for addressing the document related 
request to. The "replica not available" response indicates that actually there is no such replica 
stored at the relevant entity. 
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Provided Oiat in a netwoxk leplicas are stored in ascrading order of replica numbers, a 
''replicanot available" response indicates that there is no replica stored at this entity with this 
particular replica number. In addition, one can derive from this system rule in place that there 
are no replica stored at ^titles anywhere in the network with these entities being assigned to 
any higher replica ntunbex than this particular one. This means that it is not promismg to 
address entities that are expected to store replicas of the* same document with higher replica 
niunbers. Lisofar it is preferced to select at least one entity to address a new request to from a 
set of entities that represent lower replica numbezs than the replica number ttiat failed. This 
helps tcemi^dously to limit the amount of entities that ace worth to be addressed in genml. 

As a direct result of repeatedly applying the above rule to all answers in turn, if more than one 
entity is axxessed and all responses indicate a *'ieplica not available**^ the lowest replica 
number out of these addressed replica mmibers sets the upper limit for the new set of replica 
numbers from which another entity is selected to be addressed with the request. 

Within this set of remaining mtities, the next to be addr^sed entity/entities can again be 
selected according to dieir cost values. This means that within the set of remaining entities* it 
is again the entity/entities addressed in a further step that show the lowest cost in case low 
cost is the selection criteria. When in mm none of the responses on such request disclose an 
entity providing the requested r^lica, again the lowest replica number involved in these 
n»iuests sets an upper limit for replica numbers which associated entities might be addressed 
inafiirth^step. 

A stepwise and iteratively applied exclusion of idratified and addressed ^tities that caimot 
provide a replica and the conclusion that other entities with an assigned replica numb^ higher 
than Ae already addressed ones» reduces the communication on the network for retrieving the 
replica on magoitudes. 

However* vrh^in upon any indication from the addressed entity/entities that neither the 
replica is not available nor the replica is available there, one caimot derive that any other 
entities determined by higb^ replica numbers than the one that failed do generally not provide 
the replica in question. Consequentiy, such »cities cannot be excluded from addressing 
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further requests to. Only the addressed eatity/entities can be excluded for now, as it may 
remain unreachable for at least the time of this queiy. Instead, it is prefixed to address a 
request to anoth^ one of the lemaining entities which is selected from the identified entities, 
and which preferably shows the next best cost value. 

According to another embodiment of the invention, one or more most preferred entities are 
selected fiom the identified entities, and the document related request is addiessed to eacdi 
most prefeired entity. In this embodiment, the addressed entities are selected according to 
their distance from the retrieving entity, ^ere each most prefezred entity shows a short 
distance firom the retrieving entity, wherein a short distance can be defined eidser absolutely 
by applying a threshold or relatively by comparing the detected distances. E.g. when g entities 
are to be selected» it is prefened that the g entities are the ones showing the shortest distance 
with regard to the retrieving entity out of the identified entities. In somje networks and in 
particular in some entity identifier notations, the location and consequently a distance measure 
can be derived from the associated entity identlBer. Such a distance can be regarded as a cost 
value and the cost value can be used as ciiceria for the selection process. 

Again, it is preferred in this embodiment that upon receiving a "replica not available" message 
ficom the addressed entity, at least one other entity is selected from a set of identified entities 
as a second best preferred entity for addressing the document related request to, this set of 
identified entities being limited to entities with corresponding replica numbers lower than the 
lowest replica number that is associated to the most preferred entity identifier/s. The second 
preferred entity is pref^bly selected from die set of identified entities according to its 
distance firom the retrieving entity, wherein the closest distance is derived amongst the set of 
entities £rom the associated entity identifiers. 

According to other aspects of the present invention, there are provided computer program 
(dements comprising computer program code means whicii, when loaded in a processor unit 
of a computing entitj", configures a pfOcessor unit to perform a method as claimed in any one 
of tile claims 1 to 22 and 25. 
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In addition, there is provided a computing entity for retrieving a replica of an electronic 
document in a computer network, comjirising a control unit designed to perform a method for 
retrieving a replica of an electronic document in a computer network as specified above or as 
specified in any one of the claims 1 to 22. 

And there is provided a computing entity for depositing a replica of an electronic document in 
a computer network* comprising a control unit designed to perform a method for depositing a 
replica of an electronic document in a computer network as specified above or in claim 25. 



Advantages and embodiments described with reference to the methods for retrieving or . 
depositing replicas on a compute network are also con^d^ned being beneficial respectively 
b^g embodiments for tiie herein before des^ibed computing entities and computer program 
elements. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Preferred embodiments of the invention will now be described, by way of example, widi 
refaience to tiie accompanying drawings in which: 

Figure 1 showsapartof a netwodi as part of a distributed storage. 
Figure 2 shows a diagram of a cost fiinction, 

Figures 3a and 3b illustrate flow charts of methods of retrievixig a replica in a computw 
network, in accordance with an embodiment of the present inventioiu and 

Figure 4 shows another diagram of a cost function. 



I>iff erent figures may show identical i^rences, representing elements with similar or 
uniform content. 
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DETAILED DESCRSPTION OF THE DRAWINGS 

Figure 1 shows computing entities 100. 120. 140, 160, 180, 200. 220, 240, 260, 280 300 320 
340. 360, 380, 400, 420. 440. 460. 480. 500, 520. 540. 560, 580, 600. 620. 640. 660. 68o'. 
also called nodes - being part of a computer netwoik 1 . Ihe numbers the computing entities 
are provided wift aie entity identifiers which in the end either are or represent addresses of the 
entities. 

It is assumed that the document with the document identifier Dl is a very popular documem. 
Therefore, there are many replicas of tiiis document stored aU over the network 1. A 
maximum number of replicas distributed ov«- the network is N « 6, where this number is 
chosen as arelativelysmaU number for ilhistrating purposes. Hence, a system-wide maximum 
N repUcas per document are provided. The foUowing notation is introduced: Dl:l represents 

repUca number 1 of document Dl, Dl:2, represents replica number 2 of document Dl, and so 
oiu 

As cao be derived from Figure 1 . the k = N repKcas of document D 1 are distributed over the 
network 1 at locations/eniities with entity identifiers, 100. 200, 300, 400, 500, 600. On the 
other hand, it can also be derived fiom this Figure 1 that another document D2 is also 
avaUable in form of N = 6 replicas for iUustraiion purposes which are locatied at entities 1 20, 
220, 320, 420, 520, 620. 

Documents Dl and D2 are deposited over network 1 according to a function 

hO. Dx) = 100 + (((x-1) * 20) + (a-l)*100) modulo 600) 

with i being the replica number, and i = [1.... JN], 

(md Dx being a document idwiiifier with x being a number 1, 2. 

One of the entities shown in Hgure 1. or another entity outside the scope of the network 
shown in Figure 1 has applied this function h(ij>x) for degposithig repHcas of the documents 
Dl and D2, and actually has deposited every 6 replicas of each document over the network. 
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Only for illustration purposes, theie might be other documents deposited over the network 
following e.g. a function like h(i, Dx) = 100 + ((x * 20 + i * (120 - 20 * x) - 120) modulo 
600), resulting in an overlap'of depositing replicas of difiTerent docum^ts for example at 
entity 300. 

In more general words, for each docimient Dx, theie is a numba k of replicas that exist for 23bc. 
These replicas are stored at address h(i, d)^ with J<^l<k. h(} is preferably implemented as a 
pseudo-random hash function* This means that each address is mapped to one of the machines 
in the network using the DHT. 

Nonnally, the maximum number N of allowed replicas can be chosen v«y laig^ as the 
average number of the opeiaitions dominating the cost (Le. actual requests) can be limited to 
log CN 4- 1). For example, choosing ff^l023 results in average 10 messages or less to locate 
the closest replica and initiate download. 

A retrieving entity, which mig^ for example be entity with the identifier 140, might now wish 
to access areplica of document idmtified as Dl * Computing entity 140 has available the same 
function hG, Dx) =: 100 + (((x-1) * 20) + (a-l)*100) modulo 600) for det^cmining entities 
where replicas of document Dl tni^t be available. When applying this function for all i 
replica numbers^ retrieving entity 140 will get the entity identifiem 100« 200, 300, 400, 500, 
600 as result 



Having determined these locations where a replica of docnmient Dl might be available for 
download, retrieving entity 140 now applies a cost foncdda* Knowing, that the enti^ 
identifiers increase fitom the rigjit hand part of the network 1 to the left hand part of the 
network 1 at least when considering only the "hundreds" digit of the identifiers and being 
aware of the own location in the very ri^t hand paxt of the network, the cost funcdon sq^plied 
by entity 140 ini^t be a comparison of all the enti^ identifiers delivered by tilie function 
h(i j:)x). Ihe result might be a relation of 100 < 200 < 300 < 400 < 500 < 600. Accoiding to 
this result, it can be dmved, that (he entity with idmtifier ItM) shows ilie lowest cost value as 
being die first in the rank, and entity 600 showing die hi^est cost value as being die last in 
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the rank- Translated into written language, the cost values indicate that entity identified as 
entity 100 is probably the closest one to retrieving entity 140. As retrieving entity is interested 
in a quick download^ entity 100 seems the most cost ^cient for communicating with and 
thus for a download 

As a consequrace, tfie retrieving aatity addresses a request to entity 100 on document Dl 
which entity 100 responds with a positive answer ("replica of Dl available here") or, in an 
alternate embodiment, straight with a transmission of the replica Dl. 

Figure 2 shows a diagram of the costs associated to each identified replica of Dl, after 
applying a cost function at entity 140. The costs are determined in relation to the location of 
retrieving entity 140 in tiie netwoils: and represent an evaluation of die distances to the 
retrieving entity 140 based on the knowledge of the network structure as explained above. 

In anofh^ embodimeiit, it is assumed that entity 420 is now the retrieving entity that is 
looking for replicas of document Dl and that identifies ratities 100, 200, 300, 400, 500, 600 
as possible providers of such a r^Iica. When applying the 420 entity immanent cost function, 
it delivers as result, that probably entity 400 is the closest entity to retrieving entity 420 to get 
the r^Iica from. Figure 4 shows a diagram of the costs associated to each identified replica of 
Dl t after applying a cost function at entity 420. The costs are determined in relation to the 
location of retrieving entity 420 iu the network and represent an evaluation of die distances to 
the retrieving entity 420 based on the knowledge of the netwodc stcucture as eiqplained above. 

However, now only entities 100, 200, 300 can really provide a replica of docimient Dl as an 
administrator of document Dl migiht not have distributed repUcas yet to the entities 400, 500 
and 600 by not having applied the hash function for these replica numbers yet. 

Hence, when retrieving entity 420 now sends a request to closest entity 400, entity 400 might 
now respond that there is no replica of document Dl avaUable at entity 400. Requesting entity 
420 now might try to address another request to entity 500 as being the next preferred and 
next close entity to get the replica fiom. However, given tiiat the entities in the network are 
filled up with replicas by ascending replica number, it is apparent that when entity 400 cazmot 
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provide the replica nvusib&r i = 4, aay replicas with higher replica numbers are not available at 
the assigned entities. Thus, any request directed now to another entity than entity 400 within 
the identified entities 100 to 600 has to be directed to an entity which mi^t provide a replica 
witti a replica number smaller than i = 4. Thus, entities 100, 200, 300 may form the set of 
entities associated to replica numbers smalls than i = 4 which actually might have a replica of 
document Dl stored. As entity 300 shows the lowest costs among the remaining entries, entity 
300 may be sqpproached next bom retrieving entity 420 and may be approached with success. 
However, if even entity 300 would not have a replica of document Dl available, the set of 
entities to approached next is further limited. By applying this scheme in an iterative way, lots 
of communication to other entities can be saved. 

As already touched above, some distributed hash tables (DHT) preserve locality, i.e., by 
knowing that two addresses are similar, one can conclude that the entities serving those 
addresses will be close-by as well. This property can be used to estimate the distance - as 
measured according to the metric used by the DHT - to another entity. A DHT suppoitmg 
such a system is for example Mitfaos in "Efficient Topology-Aware Oveday Network''^ 
Marcel Waldvogel and Roberto Rinaldi, ACM Compute Communications Review, January 
2003, Volume 33, Number 1, pages 101 - 106, which is hereby incorporated by reference. 
However, other cost functions can be applied. Such cost function can be either as part of the 
network itself or on top of it, or individual as part of each entity. 

According to another embodiment of die invention, when a client wants to locate the 
document with ID ^ it a priori only needs to know N, the maxunum number of rqplicas and 
the hash function used. The process is then as follows: 

1. r:=N 

2. The cli^t determines which of the nodes that would r^licate the document would be 
closest, by checking the distance of the serving document for all possible values of the 
replica number, i (l<Hl<r). It picks the one which is closest - assume this is node g and 
asks it for the document In a well-suited DHT, or augmwted by an appropr iate local 
database, this does not require sending any messages. 
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3. If the asked node repUes with the document, everything is fine; else, r:= g-1 and the 
algorithm continues at step 2 (caching the calculation of the distances is recommended). 

IWs is a way to return the closest possible replica, as the search starts with the closest node. If 
it is in fact a repUca, the closest replica was found. If it has not, all replicas numbered g or 
higher fiom the search can be excluded, as the replica allocation policy guarantees the xepHca 
numbers to be contiguous and starting finom 1. 

H is expected to stop after log N steps. As the hash function is pseudo-random, the closest 
node in each interval will'on average be in the middle. Thus, an average of 50% of the 
candidate nodes can be excluded. It is guaranteed to make progress, as at least one node is 
always excluded. And it is guaranteed to find the document, if at least one r^Kca (numbered 
1) exists. No further information is required. 

Thus, the retrieving method of this embodiment can be characterized as a randomized binary 
search from the p«spective of tiie pivot selection, ft can be extended to fifc+i j-ary search, by 
concurrenfly probing the k closest nodes. Thra, the retrieving node might not want to direcfly 
ask the identified and where ^plicable cost-selected nodes for the document, but only ask 
whether ttie node has it, in order to prevent multiple transmissions. 

Back to Figure 1, the parameters that are known and invariant system-wide are ihe TnaiciT^iifn 
number of replicas N, and the hash function HQ used. 

The approach for retrieving and/or depositing replicas in a computer network as proposed can 
also be used as a backup system for a site- or organization-wide replicated networked storage 
system based on a centralized directory stmcture for replica management In such a scenario, 
the directory server would provide a single point of failure. In this case, only a subset of the 
r^licas need to be placed accoidmg to the above system. Othi^ replicas could be placed as 
desired based on locality pattsns (e.g. guaranteed off-site storage) or access patterns (e.g. 
close to the clients). Then, the data stored according to this invention would still be accessible 
even if the centralized directory would fail or become otherwise inaccessible. Even thougji 
performance might be reduced depending on the size and placement of the subset of replicas 



30/08 '03 MO 16:02 FAX +41 1 724 SB 51 

CH920030035 



IBM ZURICH IFD 



•♦-^ EPOl PATENTS 



@030 



-.21- 

that axe assigned according to the present invention* access to die data would still be 
goaranteed. 

In very localized systems/networks - Le. within a single building or site^ where all systems are 
basically equally well accessible the distance fonction can be replaced by a randoin 
function, to achieve efficient load balancing. 

According to another embodiment of the prosent invention, a cost function represented by a 
distance estimation function as described above and based on information that can be derived 
ftom entity identifi^ themselves mi^t not be available^ Instead of selecting closest entities 
out of the entities identified as r^lica deposit, random entities might be chosen out of the 
identified entities and addressed. Ibe response on such request may carry information such as 
replica load or round trip tune which can help choosing the best match in case of multiple 
matches. This random selection in combination with a cost estimation does not guarantee to 
detect title closest node, however it may lead the retrieving entity to eventual location of a 
replica. A relatively large number k of addressed entities results in more probes and thus a 
higher probability for a clos&-by entity. 

In case; a cost function like the distance estimation function as explained above is expensive^ 
but anyway cheaper than sending a message to the entity^ from the identified entities only a 
subset of entities might be chosen for having calculated their respective cost values. Then, the 
best entities firom a cost value point of view are selected fi3r sending a request to. 

In both of the above CTibodiments, it is not necessarily needed to identify all N entities that 
mi^t be in the possession of a replica. Instead, only a smaller number k < N of replica 
numbers might be sdected for determinii^ assoiaated entities. 

In case approached entities are unreachable or return "overload'* messages which can be 
intefpteted as "cannot currently serve document, please pick another replica" or wh^ a 
time-out is reached after an approach, it is impossible to make an estimation whether the 
nodes do not have the replica available or whether they have it available but cannot deliver it 
currently for any reason. Witii regard to the metfiod of depositing a replica, a possible return 
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message can also indicate that the addressed entity should indeed cany a replica, bui that 
insufBcient storage is avaUable. Hie algorithm would stiU work conecUy, with minimal 
impact on its performance. So, if none of the addressed entities return either a commitment of 
the entity to send the document or an indication that they do not have the document, the seaicb 
range for addressing further requests cannot be limited. Instead, the next best entities are 
selected firom the current entity rang, e.g. using a cost estimation of any kind, and these 
entities are i^proadied nej^t. 

Figure 3 shows two flow charts emboc^ the aspect of retrieving r^licas accordijqg to the 
invention- Figure 3a) shows a method for topology awaie netwodc whereas Figure 3b) shows 
a method for a non-topology aware network. Topology-aware in this conteort means that a 
location of an «itity can at least be roughly derived from its addressTidentifier, or can be 
derived from additional, relatively inexpensive probes or measurements. 

For both (tf the charts, the following notaticm is iqpplied: 

- N, is the maximum number of replicas. 

- h(m, d). is a hash fimction on the r^Uca number m and the document ID d 

- c(a), is a cost function giving the cost to address a 

- k, is die number of probes jfer step 

With regard to Figure 3a). in step SI, r as the maximum valid number of possible replicas for 
the document is initiaUzed to N. In step S2, ^ = hCm. d). foraU m in [1, N] is calculated Ja 
step S3, c jn = c(ajn), for aU m in [1, N] is calculated, m si^ S4, k indices m_l ... mJa out 
of the set [1, r] are picked, such that die corresponding cjU are minimal. In step S5, the aj 
addresses are probed. Optionally, the probes are temunated after only fc-s answers. If. in stc^ 
S6. any of the probes retnmed document availability [1], in stt^) S7 the best probe/address is 
determined according to any metric, and m step Sg the replica is retnmed from this best 
addrete. 

If in step S6 none of the addressed entities returned a document [2], then m step S9 r is set to 
nain(mj[)-l and the method is continued at step S4. 
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With .egaid to Figute 3b). in step Rl. r as the ma^cimum vaUd number of possible i^Hcas for 
the document is set to N. Ih step R2. k iiuMces m_l „. ni^k ate nmdondy pid^ 
[1 , r]. la step R3. the addressed a J of the t^Ucas to be probed by calculating ^ hCn^i, d) 
are determined. L, step R4. the addresses are probed. Tb^ probing optionaUy tennina4 
after only k-s answers in order no t to waste dme waiting for the last few answ«s, which may 

never come, as the addressed node may be down dr unreachable, niis may improve 
peifoimance at the cost of not listening aU nodes. It is unlikely that the nodes whfch have not 
been waited for will be good candidate replicas, as thereason that they are slow in responding 
is likely due to overload of the networit or the lepUca itself. 

in step R5 any of the probes returned document availability [1], in stqp R6 the^best 
probe^address is determined according to any metric, andin step R7 the lepHca is returned 
ftom this best address return the best If none of the addressed entities returned a document 
[2], then in step R8 r is set to minCmJLH and the method is continued at step R2. 

In the foUowing section, some more embodiments with regard to aU aspects of (he present 
invBQtian are described. 

In case a document related request is only sent to one of the identified entities according to the 
cost estimation, such entity might need a long time to respond in case the load at this machine 
is very high. Snth addressing mechanism can be forther improved by selectmg at least one 
other additional entity amongst the-ideotified imtides in a random way and sending aaodier 
request to this at least one more entity. This way. an entity that is close and thus selected to be 
addressed according to the introduced scheme but that is overloaded and will thus not respond 
or respond only very slowly can be detected and can automatically ba offloaded by future 
queries, i.e. notbemg taken into account for futme requests. On the othra hand, the randomly 
selected entity might in such a case provide a requested respmise in due time without having 
to select new close by entities fijr addressing new request to. 
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According to another embodiment, the number of issued requests and thus search time can be 
restricted at (fae possibility to exclude a smaU set of nodes from the selection process. Thus, a 
sub optimum neighbor may be selected, or in the case of massive network outages combine! 
with a very bad cost function, not aU potential replicas njay be searched. The expected 
performance is Iog(N) steps/requests co be sent out. where N is the maximum number of 
i^Ucas. Yet. the worst case is N steps, namely, if the cost function is a raonotonically 
decreasing function of the replica numbers. Hie efifect of such a cost function is that each step 
will only eliminate a single potential repUca, not half of them, as expected. Hje solution to 
this problem is to have a minimum guaranteed progress each step. A potential class of 
functions which can be used to guarantee a minimum of progress include: 

Bound r (the window of potential probes) by at most mase^(steps.delui), where base is the 
base of the exponent, such as 2; steps is die number of steps that have been performed so far, 
and delta is a wprst-case rate. By choosing bcise^2, delta=2, you require at most delta 
additional steps, and your range will never be wider by more than a factor of base^delta of the 
expected case, assuming bmary search (number of probes per step. k=l). Still, the quaHty of 
die node found is bounded by a factor of base. The quality factor is defined as foUows: Given 
r_d actoal replicas, order them by increasing cost. The unmodified (unbounded) seart:h would 
always find the first (=best) of those. IHe modified case will find one which is ordered among 
the first r_d * (I-l/base) items, so it is never worse dian the r_d ♦ (I-l/base) item. 

"Simple Load Balancing for Distributed Hash Tables", J. Byers et al., 
Ihtemational Peer-to-Peer Symposium (IPTPS), February 2003, illustrates that using two 
probes within a DHT is welcome as it distributes load better: An item is typically stored at the 
less loaded of two (or more) possible locations. 
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CLAIM5 

1 . Method for retrieving a r^Iica of an electromc document in a cx»inputer iietwoik» 
comprising: 

• selecting at least one replica niunb^, 

• by ^plying a given function, the function requiring flie replica number and a document 
identifier as input: deterauning at least one entity identifier, each entity identifier 
representing an entity in the network that might provide the replica, 

• addressing a document related request to at least one of the identified entities. 

2. Method according to claim 1, comprising 

• selecting k = N replica numbers, wherein N is a mayifniim number for replicas, 

• by applying the given function k times: determining k entity identifies. 

3. Method according to claim 1» comprising 

• selecting k replica numbers fixnn a maximum number of N replicas with k < N, 

• by applying the given function k times: det^noumng k entity identifieis- 

4. Method according to claim 3, 
wherein k<5=5^ 

5. Method according to claim 3, 
whereinks^l. 

6. Method according to any one of the preceding claims, comprising 
addressing the document related request to all identified entities. 

7. Method according to any one of die preceding claims 1 to 4, comprising 
addressing the document related request to only selected ones of the identified entities. 

8. Method according to any one of tiie preceding claims 1 to 4^ comprising 
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addressing the document related request only to one entity selected firom (he identified 
entities. 

9. Method according to any one of the preceding claims 1 to 5, comprising 
calculating a cost function for each of the k entities, the cost function providing a cost value as 
result which indicates a cost to address the relevant entity. 

10. Method according to claim 9 in combination with claim 7 or claim 8» 
wherein each entity to be addressed is selected from the identified entities due to the 
associated cost value* 

1 1 . Method according to claim 10, 

wherein the addressed entity/entities is/are the one/is showing the lowest cost valuers. 

12. Metbod according to claim 6 or claim 7, 

wherein cost values for the addressed entities are derived from communication with these 
entities. 

' 13. Method according to claim 6 or claim 7, 
wh^in cost values for the addressed entities are derived from a cose database^ 

14. Method according to any on^ of the preceding claims, 

wherein upon receiving a "replica not available*' response from each of the addressed entities, 
another entity is selected from the identified entities for addressing the docuioent related 
request to. 

15. Method according to claim 14, 

wherein the other entity is selected from the idratified entities by choosing an entity with an 
associated replica numb^ that is low^ than the repUca number associated to tlie endty/entittes 
the previous request was addressed to 



16. Method according to any one of the preceding claims. 
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whereln upon any indicatioii ftom the addressed entity/entities that neither the replica is not 
available nor the repUca is available there, another entity is selected torn the Identified 
entities for addressing the document related request to. 

17. Method according to claim 16, 

wherein the othesr entity is selected due to an associated cost vahie. 

18. Method according to any one of the claims 1 to 4, comprising 

• selecting from the identified entities at least one most preferred entity, and 
" addressing the docum^ rdlated request to eadi most inefexred entity. 

19. MeAod according to daim 18, 

whentan each most piefened entity is selected accoidiiig to its distance fiwm the retrieving 
entity. 

20. MeOiod according to claim 19, 

wherein the distance of an entity is derived from the associated entity identifier, 

21. Method according to any one of the preceding daims 18 to 20, 

wherein upon receiving a "replica not available" message from the addressed entity, at least 
one other entity is sheeted from a set of identified entities as a second best preferred entity for 
addressing the document related request to. Hiis set of identified entxUes being limited to 
entities with coxce^onding rqplica mmibers loww than tiie lepUca number tiiat is associated to 
the most preferred eatity identifier. 

22. Method according to claim 19, 

wherdn the second piefened entity is sdected from the set of identified Hitities according to 
its distance from the retrieving entity, wherein the dosest distance is derived from the 
associated entity identifi^. 
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23. A computer program element comprising computer program code means which, when 
loaded in a processor unit of a computing entity, conjagures the processor unit to perform a 
method as claimed in any one of the preceding claims. 

24. Computing entity for retrieving a replica of an electronic document in a computer 
network, comprising 

a control unit designed to perform a method according to any one of the claims 1 to 22. 

25. Method for depositing a replica of an electronic document in a computer network, 

* selecting a replica number, 

* by applying a given function^ the function requiring the replica number and a document 
identifier as input: determining an entity identifier, the entity identifier representing an 
entity in title network, 

* addressing the identified entity for replica depositing purposes. 

26. A computer program element comprising computer program code means which, when 
loaded in a processor unit of a coznputing entity, configures the processor unit to perform a 
method as claimed in any one of the preceding claim 25. 

27. Computing entiQr for depositis^ a rqplica of an electronic document in a computer 
network, comprising 

a control unit designed to p^orm a metbod according to claim 25. 
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ABSTRACT 



Th^ ate introduced ways for retrieving or depositing a xeplica of an electronic document in a 
conqputer netwoik. After having selected at least one replica number, a gi vrai function is 
applied. The fianction requires as input the replica number and a document identifier. The 
fimction retums as a result at least one entity identifier, each entity identifi« zepresratiog an 
entity in the network that might provide the replica. In a n^ step, a document related request 
is addressed to at least one of the identified entities. 
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